IAD Days - 2025
# —- Let’s get started: The Basics
Time series data are measurements taken sequentially over time on a variable or variables of interest. Such data are very common and understanding the dynamic nature of time series data is of paramount interest in many fields, particularly for producing forecasts at future times.
| Data | Minute | Hour | Day | Week | Year |
|---|---|---|---|---|---|
| Quarters | 4 | ||||
| Months | 12 | ||||
| Weeks | 52.18 | ||||
| Days | 7 | 365.25 | |||
| Hours | 24 | 168 | 8766 | ||
| Minutes | 60 | 1440 | 10080 | 525960 | |
| Seconds | 60 | 3600 | 86400 | 604800 | 31557600 |
Half-hourly electricity demand in England and Wales from Monday 5 June 2000 to Sunday 27 August 2000.
A time series can be thought of as a list of numbers (the measurements), along with some information about what times those numbers were recorded (the index). This information can be stored as the following object in R:
ts: Time Series class in base Rzoo: Package for ordered indexed observationsxts: Extensible Time Series class, an extension of zootsibble: Provides a data structure for time series dataWe may utilize the following R packages for analysis:
ts: Time Series class in base Rzoo: Package for ordered indexed observationsxts: Extensible Time Series class, an extension of zootsibble: Provides a data structure for time series datatimeDate: Package for handling time and datelubridate: Package for working with dates and timestibbletime: Extension of tibble for time-based dataforecast: Package for forecasting time series datafpp3: Forecasting Principles and Practice, the third editiontidyverse: A collection of packages for data manipulation and visualization in Rtsbox: Provides a set of tools for handling time series in R# A tsibble: 5,032 x 8 [!]
# Key: Symbol [4]
Symbol Date Open High Low Close Adj_Close Volume
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200
9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400
10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700
# ℹ 5,022 more rows
[1] "tbl_ts" "tbl_df" "tbl" "data.frame"
# A tsibble: 4 x 8 [!]
# Key: Symbol [4]
# Groups: Symbol [4]
Symbol Date Open High Low Close Adj_Close Volume
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
Check all preloaded datasets you can use in your active R session
data() Period Value
1 Jan-2002 858654
2 Feb-2002 862338
3 Mar-2002 844551
4 Apr-2002 858240
5 May-2002 850935
6 Jun-2002 846777
'data.frame': 276 obs. of 2 variables:
$ Period: chr "Jan-2002" "Feb-2002" "Mar-2002" "Apr-2002" ...
$ Value : int 858654 862338 844551 858240 850935 846777 847129 839008 832134 839690 ...
Notice the column Period is poorly named and not a date!
https://lubridate.tidyverse.org
ymd(), ymd_hms, dmy(), dmy_hms, mdy() time Value
1 2002-01-01 858654
2 2002-02-01 862338
3 2002-03-01 844551
4 2002-04-01 858240
5 2002-05-01 850935
6 2002-06-01 846777
'data.frame': 276 obs. of 2 variables:
$ time : Date, format: "2002-01-01" "2002-02-01" ...
$ Value: int 858654 862338 844551 858240 850935 846777 847129 839008 832134 839690 ...
Lets try to look at a plot by month.
The trend is obscuring information. A crude trend filter
crude_trend = stats::filter(dat$Value, c(1/24, rep(1/12, 11), 1/24), sides = 2)
dat |>
mutate(detrend = Value - crude_trend) |>
mutate(month = month(time)) |> # Add month column
filter(!is.na(detrend)) |>
ggplot(aes(x=time, y=detrend, group=month)) +
geom_line(aes(col=month)) +
facet_grid(month ~ .)seq() function in R understands Date objects [1] "2020-01-01" "2020-02-01" "2020-03-01" "2020-04-01" "2020-05-01"
[6] "2020-06-01" "2020-07-01" "2020-08-01" "2020-09-01" "2020-10-01"
[11] "2020-11-01" "2020-12-01"
[1] "2020-01-01" "2020-01-08" "2020-01-15" "2020-01-22" "2020-01-29"
[6] "2020-02-05" "2020-02-12" "2020-02-19" "2020-02-26" "2020-03-04"
[11] "2020-03-11" "2020-03-18" "2020-03-25" "2020-04-01" "2020-04-08"
[16] "2020-04-15" "2020-04-22" "2020-04-29" "2020-05-06" "2020-05-13"
[21] "2020-05-20" "2020-05-27" "2020-06-03" "2020-06-10" "2020-06-17"
[26] "2020-06-24" "2020-07-01" "2020-07-08" "2020-07-15" "2020-07-22"
[31] "2020-07-29" "2020-08-05" "2020-08-12" "2020-08-19" "2020-08-26"
[36] "2020-09-02" "2020-09-09" "2020-09-16" "2020-09-23" "2020-09-30"
[41] "2020-10-07" "2020-10-14" "2020-10-21" "2020-10-28" "2020-11-04"
[46] "2020-11-11" "2020-11-18" "2020-11-25" "2020-12-02" "2020-12-09"
[51] "2020-12-16" "2020-12-23" "2020-12-30"
tsboxts_autoplotggplot# A tsibble: 5,032 x 8 [!]
# Key: Symbol [4]
Symbol Date Open High Low Close Adj_Close Volume
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200
9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400
10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700
# ℹ 5,022 more rows
# A tibble: 3,038 × 5
date series_id value realtime_start realtime_end
<date> <chr> <dbl> <date> <date>
1 1967-01-07 ICNSA 346000 2025-03-27 2025-03-27
2 1967-01-14 ICNSA 334000 2025-03-27 2025-03-27
3 1967-01-21 ICNSA 277000 2025-03-27 2025-03-27
4 1967-01-28 ICNSA 252000 2025-03-27 2025-03-27
5 1967-02-04 ICNSA 274000 2025-03-27 2025-03-27
6 1967-02-11 ICNSA 276000 2025-03-27 2025-03-27
7 1967-02-18 ICNSA 247000 2025-03-27 2025-03-27
8 1967-02-25 ICNSA 248000 2025-03-27 2025-03-27
9 1967-03-04 ICNSA 326000 2025-03-27 2025-03-27
10 1967-03-11 ICNSA 240000 2025-03-27 2025-03-27
# ℹ 3,028 more rows
# A tibble: 6 × 5
date series_id value realtime_start realtime_end
<date> <chr> <dbl> <date> <date>
1 2025-02-15 ICNSA 223538 2025-03-27 2025-03-27
2 2025-02-22 ICNSA 220856 2025-03-27 2025-03-27
3 2025-03-01 ICNSA 226019 2025-03-27 2025-03-27
4 2025-03-08 ICNSA 214006 2025-03-27 2025-03-27
5 2025-03-15 ICNSA 207398 2025-03-27 2025-03-27
6 2025-03-22 ICNSA 198917 2025-03-27 2025-03-27
lubridateSome of the variation seen in seasonal data may be due to simple calendar effects.

Call:
seas(x = AirPassengers, regression.variables = "td")
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Mon -0.00153 0.00346 -0.44 0.659
Tue -0.00768 0.00361 -2.13 0.033 *
Wed -0.00113 0.00346 -0.32 0.745
Thu -0.00535 0.00343 -1.56 0.118
Fri 0.00468 0.00345 1.36 0.175
Sat 0.00302 0.00357 0.85 0.397
Easter[1] 0.01800 0.00725 2.48 0.013 *
AO1951.May 0.10926 0.01965 5.56 2.7e-08 ***
MA-Seasonal-12 0.50078 0.07725 6.48 9.0e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
SEATS adj. ARIMA: (0 1 0)(0 1 1) Obs.: 144 Transform: log
AICc: 949, BIC: 976 QS (no seasonality in final): 0
Box-Ljung (no autocorr.): 28.1 Shapiro (normality): 0.99
Any data that are affected by population changes can be adjusted to give per-capita data.
Data which are affected by the value of money are best adjusted before modelling.
For example, the average cost of a new house will have increased over the last few decades due to inflation. A $200,000 house this year is not the same as a $200,000 house twenty years ago.
For this reason, financial time series are usually adjusted so that all values are stated in dollar values from a particular year.
print_retail <- aus_retail |>
filter(Industry == "Newspaper and book retailing") |>
group_by(Industry) |>
index_by(Year = year(Month)) |>
summarise(Turnover = sum(Turnover))
aus_economy <- global_economy |>
filter(Code == "AUS")
print_retail |>
left_join(aus_economy, by = "Year") |>
mutate(Adjusted_turnover = Turnover / CPI * 100) |>
pivot_longer(c(Turnover, Adjusted_turnover),
values_to = "Turnover") |>
mutate(name = factor(name,
levels=c("Turnover","Adjusted_turnover"))) |>
ggplot(aes(x = Year, y = Turnover)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y") +
labs(title = "Turnover: Australian print media industry",
y = "$AU")logA trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes we will refer to a trend as “changing direction”, when it might go from an increasing trend to a decreasing trend.
A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. Seasonality is usually of a fixed and known period.
Consider aus_production dataset
Autoregressive Integrated Moving Average (ARIMA) models are the most popular parametric models for time series that directly model the autocorrelation structure of the time series based on different values of the parameters in the model
\[ X_t = \phi_1 X_{t-1} + \cdots + \phi_p X_{t-p} + \theta_1 \epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q} + \epsilon_t \] where \(\phi_1, \ldots, \phi_p, \theta_1, \ldots, \theta_q\) are some unknown parameters of the model. There is one more paramter, that is the variance of the noise process, \(Var(\epsilon_t) = \sigma^2.\)
\[ X_t = \phi X_{t-1} + \Phi X_{t-12} + \theta \epsilon_{t-1} + \Theta\epsilon_{t-12} + \epsilon_t\]
If \(\{X_t\}\) is a stationary time series, then for all \(s\), the distribution of \((X_t,\dots,X_{t+s})\) does not depend on \(t\).
A stationary series is:
Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time:
\[ \begin{align*} X_{t} & = X_{t} - X_{t - 1} \\ & = (X_t - X_{t-1}) - (X_{t-1}-X_{t-2})\\ & = X_t - 2X_{t-1} +X_{t-2}. \end{align*} \]
A seasonal difference is the difference between an observation and the corresponding observation from the previous year. \[ \Delta_m X_t = X_t - X_{t-m} \] where \(m=\) number of seasons.
Series: x
ARIMA(2,0,2)(1,0,1)[12] with non-zero mean
Coefficients:
ar1 ar2 ma1 ma2 sar1 sma1 mean
0.721 0.108 -0.681 0.120 0.224 -0.846 0.114
s.e. 0.236 0.206 0.232 0.175 0.111 0.084 0.003
sigma^2 = 0.00359: log likelihood = 266
AIC=-516 AICc=-515 BIC=-490
If \(\Delta_{12}X_t = X_t - X_{t-12}\) denotes seasonally differenced series, then differenced seasonally differenced series is
\[ \begin{align*} \Delta_{12} \Delta \,X_t &= \Delta_{12}(X_t - X_{t-1}) \\ &= (X_t - X_{t-12}) - (X_{t-1} - X_{t-13}) \\ &= X_t - X_{t-1} - X_{t-12} + X_{t-13}\ \end{align*} \]
When both seasonal and first differences are applied
It is important that if differencing is used, the differences are interpretable.
\[ \usepackage{amsmath} \usepackage{amssymb} \usepackage{amsfonts} \newcommand\mcy{{\bf y}} \]
\[ y_t = \beta_0 + \beta_1 x_{1, t} + \cdots + \beta_r x_{r, t} + \epsilon_t \]
\[ y_t = \sum_{j=1}^{r}\beta_j z_{t, j} + x_t \]
It is often possible to assume a stationary covariance structure for the error process that corresponds to a linear process and try to find an ARMA representation.
This amounts to deciding on a \(p\) and \(q\), but how do we decide?
\[ M_t = \beta_1 + \beta_2 t + \beta_3 T_t + \beta_4 T_t^2 + \beta_5 P_t + \epsilon_t \]
temp = tempr-mean(tempr) # mean center temp
temp2 = temp^2
trend = time(cmort) # time
fit = lm(cmort~ trend + temp + temp2 + part, na.action=NULL)
summary(fit) # regression results
Call:
lm(formula = cmort ~ trend + temp + temp2 + part, na.action = NULL)
Residuals:
Min 1Q Median 3Q Max
-19.076 -4.215 -0.488 3.744 29.245
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.83e+03 2.00e+02 14.19 < 2e-16 ***
trend -1.40e+00 1.01e-01 -13.82 < 2e-16 ***
temp -4.72e-01 3.16e-02 -14.94 < 2e-16 ***
temp2 2.26e-02 2.83e-03 7.99 9.3e-15 ***
part 2.55e-01 1.89e-02 13.54 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.39 on 503 degrees of freedom
Multiple R-squared: 0.595, Adjusted R-squared: 0.592
F-statistic: 185 on 4 and 503 DF, p-value: <2e-16
Before we look at the residuals of the model and decide on what type of ACVF behavior we see. Let’s look at known behavior:
| AR(p) | MA(q) | |
|---|---|---|
| ACF | Tails off | PACF Cuts off after lag p |
| PACF | Cuts off after lag p | Tails off |
Would have been a mistake to fit an error with uncorrelated residuals
A plot of the residuals indicates an AR(2) stochastic structure
\[ M_t = \beta_1 + \beta_2 t + \beta_3 T_t + \beta_4 T_t^2 + \beta_5 P_t + x_t \]
where
\[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + w_t \]
<><><><><><><><><><><><><><>
Coefficients:
Estimate SE t.value p.value
ar1 0.3848 0.0436 8.833 0.0000
ar2 0.4326 0.0400 10.806 0.0000
intercept 3075.1482 834.7474 3.684 0.0003
trend -1.5165 0.4227 -3.588 0.0004
temp -0.0190 0.0495 -0.384 0.7014
temp2 0.0154 0.0020 7.612 0.0000
part 0.1545 0.0272 5.680 0.0000
sigma^2 estimated as 26 on 501 degrees of freedom
AIC = 6.13 AICc = 6.13 BIC = 6.2
Daily electricity demand can be modeled as a function of temperature
more electricity is used on cold days due to heating and hot days due to air conditioning
Moderate temperature requires the least amount of electricity
fit <- vic_elec_daily |>
model(ARIMA(Demand ~ Temperature + I(Temperature^2) +
(Day_Type == "Weekday")))
report(fit)Series: Demand
Model: LM w/ ARIMA(2,1,2)(2,0,0)[7] errors
Coefficients:
ar1 ar2 ma1 ma2 sar1 sar2 Temperature
-0.1093 0.7226 -0.0182 -0.9381 0.1958 0.417 -7.614
s.e. 0.0779 0.0739 0.0494 0.0493 0.0525 0.057 0.448
I(Temperature^2) Day_Type == "Weekday"TRUE
0.1810 30.40
s.e. 0.0085 1.33
sigma^2 estimated as 44.91: log likelihood=-1206
AIC=2432 AICc=2433 BIC=2471
vic_elec_future <- new_data(vic_elec_daily, 14) |>
mutate(
Temperature = 26,
Holiday = c(TRUE, rep(FALSE, 13)),
Day_Type = case_when(
Holiday ~ "Holiday",
wday(Date) %in% 2:6 ~ "Weekday",
TRUE ~ "Weekend"
)
)
forecast(fit, vic_elec_future) |>
autoplot(vic_elec_daily) +
labs(title="Daily electricity demand: Victoria",
y="GW")\[ \usepackage{amsmath} \usepackage{amssymb} \usepackage{amsfonts} \newcommand\mcy{{\bf y}} \]
A linear trend can be modelled by simply using \[x_{1,t} = t\] as a predictor,
\[y_t = \beta_0 + \beta_1 t + \epsilon_t\]
A dummy variable is also known as an “indicator variable”.
Example a predictor is a categorical variable taking only two values (e.g., “yes” and “no”)?
a “dummy variable” which takes value 1 corresponding to “yes” and 0 corresponding to “no”.
A dummy variable can also be used to account for an outlier in the data.
Rather than omit the outlier, a dummy variable removes its effect.
In this case, the dummy variable takes value 1 for that observation and 0 everywhere else.
| d1, t | d2, t | d3, t | d4, t | d5, t | d6, t | |
|---|---|---|---|---|---|---|
| Monday | 1 | 0 | 0 | 0 | 0 | 0 |
| Tuesday | 0 | 1 | 0 | 0 | 0 | 0 |
| Wednesday | 0 | 0 | 1 | 0 | 0 | 0 |
| Thursday | 0 | 0 | 0 | 1 | 0 | 0 |
| Friday | 0 | 0 | 0 | 0 | 1 | 0 |
| Saturday | 0 | 0 | 0 | 0 | 0 | 1 |
| Sunday | 0 | 0 | 0 | 0 | 0 | 0 |
| Monday | 1 | 0 | 0 | 0 | 0 | 0 |
\[y_t = \beta_0 + \beta_1 t + \beta_2 d_{2,t} + \beta_3 d_{3,t} + \beta_4 d_{4,t} + \epsilon_t,\]
Series: Beer
Model: TSLM
Residuals:
Min 1Q Median 3Q Max
-42.9 -7.6 -0.5 8.0 21.8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 441.8004 3.7335 118.33 < 2e-16 ***
trend() -0.3403 0.0666 -5.11 2.7e-06 ***
season()year2 -34.6597 3.9683 -8.73 9.1e-13 ***
season()year3 -17.8216 4.0225 -4.43 3.4e-05 ***
season()year4 72.7964 4.0230 18.09 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.2 on 69 degrees of freedom
Multiple R-squared: 0.924, Adjusted R-squared: 0.92
F-statistic: 211 on 4 and 69 DF, p-value: <2e-16
augment(fit_beer) |>
ggplot(aes(x = Quarter)) +
geom_line(aes(y = Beer, colour = "Data")) +
geom_line(aes(y = .fitted, colour = "Fitted")) +
scale_colour_manual(
values = c(Data = "black", Fitted = "#D55E00")
) +
labs(y = "Megalitres",
title = "Australian quarterly beer production") +
guides(colour = guide_legend(title = "Series"))Often useful to include advertising expenditure as a predictor.
The effect of advertising can last beyond the actual campaign
Need to include lagged values of advertising expenditure
\(x_1\) = advertising for previous month;
\(x_2\) = advertising for two months previously;
\(\vdots\)
\(x_m\) = advertising for \(m\) months previously.
Some holidays can move around in the calendar
If \(m\) is the seasonal period, then the first few Fourier terms are given by:
\[ x_t = \beta_0 + \beta_1 t + \beta_2 t^2 + \beta_3 t^3 + w_t \]
An extension of polynomial regression is to first divide time \(t = 1, \ldots, n\) into \(k\) intervals where \(t_1, \ldots, t_k\) are called knots.
Think of taking a long drive where \(m_t\) is the position of your car at time \(t\). In this case, \(m_t^{''}\) is instantaneous acceleration/deceleration, and \(\int (m_t^{''})^2 \, dt\) is a measure of the total amount of acceleration and deceleration on your trip. A smooth drive would be one where a constant velocity is maintained (i.e., \(m_t^{''} = 0\)). A choppy ride would be when the driver is constantly accelerating and decelerating, such as beginning drivers tend to do.
If \(\lambda = 0\), we don’t care how choppy the ride is, and this leads to \(m_t = x_t\), the data, which are not smooth. If \(\lambda = \infty\), we insist on no acceleration or deceleration (\(m'' = 0\)); in this case, our drive must be at constant velocity, \(m_t = c + vt\), and consequently very smooth. Thus, \(\lambda\) is seen as a trade-off between linear regression (completely smooth) and the data itself (no smoothness). The larger the value of \(\lambda\), the smoother the fit.
In R, the smoothing parameter is called spar and it is monotonically related to \(\lambda\)
type ?smooth.spline to view the help file for details.
na.spline from zooBasic Idea
\[ \widehat{y}_{T+h | T} = y_T \]
\[ \widehat{y}_{T+h | T} = \frac{1}{T} \sum_{t=1}^{T} y_t \]
These are both quite extreme forecasts!
We often want something between these two extremes.
Attach larger weights to more recent observations than to observations from the distant past.
This is exactly the concept behind simple exponential smoothing
\[ \widehat{y}_{T+1 | T} = \alpha y_T + \alpha(1-\alpha)y_{T-1} + \alpha(1-\alpha)^2 y_{T-2} + \cdots \]
where \(0 \leq \alpha \leq 1\)
Weighted average form
Component form
\[ \hat{y}_{T+1|T} = \alpha y_T + (1-\alpha) \hat{y}_{T|T-1} \]
\[ \begin{align*} \text{Forecast equation} && \hat{y}_{t+h|t} & = \ell_{t}\\ \text{Smoothing equation} && \ell_{t} & = \alpha y_{t} + (1 - \alpha)\ell_{t-1}, \end{align*} \]
For simple exponential smoothing, we need to select the values of \(\alpha\) and \(\ell_0\).
the unknown parameters and the initial values for any exponential smoothing method can be estimated by minimising the SSE
\[ \text{SSE}=\sum_{t=1}^T(y_t - \hat{y}_{t|t-1})^2=\sum_{t=1}^Te_t^2. \]
\[ \begin{align*} \text{Forecast equation}&& \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} \\ \text{Level equation} && \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ \text{Trend equation} && b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)b_{t-1}, \end{align*} \]
\(\ell_t\) denotes an estimate of the level of the series at time t
\(b_t\) denotes an estimate of the trend (slope) of the series at time t.
\(\alpha\) is the smoothing parameter
\(\beta^*\) smoothing parameter for the trend
This method also includes a damping parameter \(0 < \phi < 1\)
If \(\phi = 1\) the method is identical to Holt’s linear method
\(\phi\) dampens the trend so that it approaches a constant some time in the future
\[ \begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + (\phi+\phi^2 + \dots + \phi^{h})b_{t} \\ \ell_{t} &= \alpha y_{t} + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 -\beta^*)\phi b_{t-1}. \end{align*} \]
aus_economy |>
model(
`Holt's method` = ETS(Pop ~ error("A") +
trend("A") + season("N")),
`Damped Holt's method` = ETS(Pop ~ error("A") +
trend("Ad", phi = 0.9) + season("N"))
) |>
forecast(h = 15) |>
autoplot(aus_economy, level = NULL) +
labs(title = "Australian population",
y = "Millions") +
guides(colour = guide_legend(title = "Forecast"))Holt (1957) and Winters (1960) extended Holt’s method to capture seasonality
The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations
\[ \begin{align*} \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} + s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1}\\ s_{t} &= \gamma (y_{t}-\ell_{t-1}-b_{t-1}) + (1-\gamma)s_{t-m}, \end{align*} \]
\[ \begin{align*} \hat{y}_{t+h|t} &= (\ell_{t} + hb_{t})s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha \frac{y_{t}}{s_{t-m}} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t}-\ell_{t-1}) + (1 - \beta^*)b_{t-1} \\ s_{t} &= \gamma \frac{y_{t}}{(\ell_{t-1} + b_{t-1})} + (1 - \gamma)s_{t-m}. \end{align*} \]
aus_holidays <- tourism |>
filter(Purpose == "Holiday") |>
summarise(Trips = sum(Trips)/1e3)
fit <- aus_holidays |>
model(
additive = ETS(Trips ~ error("A") + trend("A") +
season("A")),
multiplicative = ETS(Trips ~ error("M") + trend("A") +
season("M"))
)
fc <- fit |> forecast(h = "3 years")
fc |>
autoplot(aus_holidays, level = NULL) +
labs(title="Australian domestic tourism",
y="Overnight trips (millions)") +
guides(colour = guide_legend(title = "Forecast"))Point forecasts can be obtained from the models by iterating the equations
For example, \(\widehat{y}_{T+2|T} = \ell_T + 2b_T\) and so on.
Using forecast() function from fable package:
The prediction intervals will differ between models with additive and multiplicative methods
For most ETS models, a prediction interval can be written as:
\[ \widehat{y}_{T+h | T} \pm c\sigma_h \]
where \(c\) depends on coverage probability and \(\sigma_h\) is the forecast variance.
The observations \(y_t\) depend on the value of an unobserved state \(x_t\).
We want to make inference about \(x_t\)
Diagram of a state space model
State Equation: p-dimensional \[ x_t = \Phi x_{t-1} + w_t \]
Observation Equation: q-dimensional \[ y_t = A_t x_t + v_t \]
State Equation: p-dimensional \[ x_t = \Phi x_{t-1} + \Upsilon u_t + w_t \]
Observation Equation: q-dimensional \[ y_t = A_t x_t + \Gamma u_t + v_t \]
\[ y_{t,1} = x_t + v_{t,1} \quad \text{and} \quad y_{t,2} = x_t + v_{t,2} \]
\[ \begin{pmatrix} y_{t, 1} \\ y_{t, 2} \end{pmatrix} = \begin{pmatrix} 1 \\ 1 \end{pmatrix} x_t + \begin{pmatrix} v_{t, 1} \\ v_{t, 2} \end{pmatrix} \]
\[ x_t = \delta + x_{t-1} + w_t \]
This example has:
Suppose we consider the problem of monitoring the level of several biomedical markers after a cancer patient undergoes a bone marrow transplant.
measurements made for 91 days on
Approximately 40% of the values are missing, with missing values occurring primarily after the 35th day.
“Platelet count at about 100 days post transplant has previously been shown to be a good indicator of subsequent long term survival.”
State Equation:
\[ \begin{pmatrix} x_{t, 1} \\ x_{t, 2} \\ x_{t, 3} \end{pmatrix} = \begin{pmatrix} \phi_{11} & \phi_{12} & \phi_{13} \\ \phi_{21} & \phi_{22} & \phi_{23} \\ \phi_{31} & \phi_{32} & \phi_{33} \end{pmatrix} \begin{pmatrix} x_{t-1, 1} \\ x_{t-1, 2} \\ x_{t-1, 3} \end{pmatrix} + \begin{pmatrix} w_{t, 1} \\ w_{t, 2} \\ w_{t, 3} \end{pmatrix} \]
Observation Equation:
\[ y_t = A_t x_t + v_t \]
where the \(A_t\) matrix is \(I\) if day \(t\) is observed and \(0\) matrix if missing.
Consider the problem of trying to predict \(x_t\) from \(y_{1:s} = \{y_1, y_2, \ldots, y_s\}\).
If you can put the problem into SSF you can use the Kalman Filter for the following:
\(s < t\) this is forecasting or prediction
\(s = t\) this is filtering
\(s > t\) this is smoothing
State Equation:
\[ \mu_t = \mu_{t-1} + w_t \]
with \(w_t \sim N(0, 1)\) and \(\mu_0 \sim N(0, 1)\).
Observation Equation:
\[ y_t = \mu_t + v_t \]
# filter and smooth (Ksmooth does both)
mu0 = 0; sigma0 = 1; phi = 1; sQ = 1; sR = 1
ks = Ksmooth(y, A=1, mu0, sigma0, phi, sQ, sR)
names(ks) [1] "Xs" "Ps" "X0n" "P0n" "J0" "J" "Xp" "Pp"
[9] "Xf" "Pf" "like" "innov" "sig" "Kn"
par(mfrow=c(3,1))
tsplot(mu[-1], type='p', col=4, pch=19, ylab=expression(mu[~t]), main="Prediction", ylim=c(-5,10))
lines(ks$Xp, col=6)
lines(ks$Xp+2*sqrt(ks$Pp), lty=6, col=6)
lines(ks$Xp-2*sqrt(ks$Pp), lty=6, col=6)
tsplot(mu[-1], type='p', col=4, pch=19, ylab=expression(mu[~t]), main="Filter", ylim=c(-5,10))
lines(ks$Xf, col=6)
lines(ks$Xf+2*sqrt(ks$Pf), lty=6, col=6)
lines(ks$Xf-2*sqrt(ks$Pf), lty=6, col=6)
tsplot(mu[-1], type='p', col=4, pch=19, ylab=expression(mu[~t]), main="Smoother", ylim=c(-5,10))
lines(ks$Xs, col=6)
lines(ks$Xs+2*sqrt(ks$Ps), lty=6, col=6)
lines(ks$Xs-2*sqrt(ks$Ps), lty=6, col=6)Parameters are \(\phi, \sigma_w, \sigma_v\)
Simulate values with true parameters:
u = ts.intersect(y, stats::lag(y,-1), stats::lag(y,-2))
varu = var(u)
coru = cor(u)
phi = coru[1,3]/coru[1,2]
q = (1-phi^2)*varu[1,2]/phi
r = varu[1,1] - q/(1-phi^2)
(init.par = c(phi, sqrt(q), sqrt(r))) [1] 0.837 0.796 1.116
optimLinn=function(para){
sQ = para[1] # sigma_w
sR1 = para[2] # 11 element of sR
sR2 = para[3] # 22 element of sR
sR21 = para[4] # 21 element of sR
sR = matrix(c(sR1,sR21,0,sR2), 2) # put the matrix together
drift = para[5]
kf = Kfilter(y,A,mu0,Sigma0,Phi,sQ,sR,Ups=drift,Gam=NULL,input) # NOTE Gamma is set to NULL now (instead of 0)
return(kf$like)
}init.par = c(.1,.1,.1,0,.05) # initial values of parameters
est = optim(
init.par,
Linn,
method = "BFGS",
hessian = TRUE
)
SE = sqrt(diag(solve(est$hessian)))
# Summary of estimation
estimate = est$par; u = cbind(estimate, SE)
rownames(u)=c("sigw","sR11", "sR22", "sR21", "drift")
u estimate SE
sigw -0.04760 0.01092
sR11 0.50111 0.02971
sR22 -0.10185 0.00891
sR21 0.00304 0.01457
drift 0.00502 0.00367
tsplot(
y,
spag = TRUE,
margins = .5,
type = 'o',
pch = 2:3,
col = 4:3,
lty = 6,
ylab = 'Temperature Deviations'
)
xsm = ts(as.vector(ks$Xs), start = 1850)
rmse = ts(sqrt(as.vector(ks$Ps)), start = 1850)
lines(xsm, lwd = 2, col = 6)
xx = c(time(xsm), rev(time(xsm)))
yy = c(xsm - 2 * rmse, rev(xsm + 2 * rmse))
polygon(xx, yy, border = NA, col = gray(.6, alpha = .25))Structural models are component models in which each component may be thought of as explaining a specific type of behavior.
Models are often some version of the classical time series decomposition of data into trend, seasonal, and irregular components.
For example,
\[ y_t = T_t + S_t + v_t \]
Putting a structural component model in SSF can be seen though an example.
Consider Johnson & Johnson quarterly earnings
We pose the following models for the trend and seasonal components:
\[ T_t = \phi T_{t-1} + w_{t, 1} \]
\[ S_t + S_{t-1} + S_{t-2} + S_{t-3} = w_{t, 2} \]
How do we put this in SSF?
\[ y_t = T_t + S_t + v_t \]
Observation Equation:
\[ y_t = \begin{pmatrix} 1 & 1 & 0 & 0 \end{pmatrix} \begin{pmatrix} T_{t} \\ S_{t} \\ S_{t-1} \\ S_{t-2} \end{pmatrix} + \begin{pmatrix} v_{t, 1} \\ v_{t, 2} \\ 0 \\ 0 \end{pmatrix} \]
State Equation:
\[ \begin{pmatrix} T_{t} \\ S_{t} \\ S_{t-1} \\ S_{t-2} \end{pmatrix} = \begin{bmatrix} \phi & 0 & 0 & 0 \\ 0 & -1 & -1 & -1 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{pmatrix} T_{t-1} \\ S_{t-1} \\ S_{t-2} \\ S_{t-3} \end{pmatrix} + \begin{pmatrix} w_{t, 1} \\ w_{t, 2} \\ 0 \\ 0 \end{pmatrix} \]
where
\[ Q = \begin{pmatrix} q_{11} & 0 & 0 & 0 \\ 0 & q_{22} & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix} \]
num = length(jj)
A = cbind(1,1,0,0)
# Function to Calculate Likelihood
Linn = function(para){
Phi = diag(0,4)
Phi[1,1] = para[1]
Phi[2,]=c(0,-1,-1,-1); Phi[3,]=c(0,1,0,0); Phi[4,]=c(0,0,1,0)
sQ1 = para[2]; sQ2 = para[3] # sqrt q11 and sqrt q22
sQ = diag(0,4); sQ[1,1]=sQ1; sQ[2,2]=sQ2
sR = para[4] # sqrt r11
kf = Kfilter(jj, A, mu0, Sigma0, Phi, sQ, sR)
return(kf$like)
}# Initial Parameters
mu0 = c(.7,0,0,0)
Sigma0 = diag(.04, 4)
init.par = c(1.03, .1, .1, .5) # Phi[1,1], the 2 Qs and R
# Estimation
est = optim(
init.par,
Linn,
method = "BFGS",
hessian = TRUE
)
SE = sqrt(diag(solve(est$hessian)))
u = cbind(estimate = est$par, SE)
rownames(u) = c("Phi11", "sigw1", "sigw2", "sigv")
u estimate SE
Phi11 1.035085 0.00254
sigw1 0.139726 0.02155
sigw2 0.220878 0.02376
sigv 0.000466 0.24175
# Smooth
Phi = diag(0,4)
Phi[1,1] = est$par[1]; Phi[2,] = c(0,-1,-1,-1)
Phi[3,] = c(0,1,0,0); Phi[4,] = c(0,0,1,0)
sQ = diag(0,4)
sQ[1,1] = est$par[2]
sQ[2,2] = est$par[3]
sR = est$par[4]
ks = Ksmooth(jj, A, mu0, Sigma0, Phi, sQ, sR)
# Plots
Tsm = ts(ks$Xs[1,,], start=1960, freq=4)
Ssm = ts(ks$Xs[2,,], start=1960, freq=4)
p1 = 3*sqrt(ks$Ps[1,1,]); p2 = 3*sqrt(ks$Ps[2,2,])
par(mfrow=c(2,1))
tsplot(Tsm, main='Trend Component', ylab='Trend')
xx = c(time(jj), rev(time(jj)))
yy = c(Tsm-p1, rev(Tsm+p1))
polygon(xx, yy, border=NA, col=gray(.5, alpha = .3))
tsplot(jj, main='Data & Trend+Season', ylab='J&J QE/Share', ylim=c(-.5,17))
xx = c(time(jj), rev(time(jj)) )
yy = c((Tsm+Ssm)-(p1+p2), rev((Tsm+Ssm)+(p1+p2)) )
polygon(xx, yy, border=NA, col=gray(.5, alpha = .3))# Forecast
n.ahead = 12
y = ts(append(jj, rep(0,n.ahead)), start=1960, freq=4)
rmspe = rep(0,n.ahead)
x00 = ks$Xf[,,num]
P00 = ks$Pf[,,num]
Q = sQ%*%t(sQ)
R = sR%*%t(sR)
for (m in 1:n.ahead){
xp = Phi%*%x00
Pp = Phi%*%P00%*%t(Phi)+Q
sig = A%*%Pp%*%t(A)+R
K = Pp%*%t(A)%*%(1/sig)
x00 = xp
P00 = Pp-K%*%A%*%Pp
y[num+m] = A%*%xp
rmspe[m] = sqrt(sig)
}
tsplot(y, type='o', main='', ylab='J&J QE/Share', ylim=c(5,30), xlim = c(1975,1984))
upp = ts(y[(num+1):(num+n.ahead)]+2*rmspe, start=1981, freq=4)
low = ts(y[(num+1):(num+n.ahead)]-2*rmspe, start=1981, freq=4)
xx = c(time(low), rev(time(upp)))
yy = c(low, rev(upp))
polygon(xx, yy, border=8, col=gray(.5, alpha = .3))
abline(v=1981, lty=3)They allow complex nonlinear relationships between the response variable and its predictors.
A neural network can be thought of as a network of “neurons” which are organised in layers
The predictors (or inputs) form the bottom layer, and the forecasts (or outputs) form the top layer. There may also be intermediate layers containing “hidden neurons”.
A simple neural network equivalent to a linear regression.
A neural network with four inputs and one hidden layer with three hidden neurons.
The inputs into each hidden neuron in previous figure are combined linearly to give \[ z_j = b_j + \sum_{i=1}^4 w_{i,j} x_i. \]
In the hidden layer, this is then modified using a nonlinear function such as:
Sigmoid: \[ s(z) = \frac{1}{1+e^{-z}}, \]
ReLu (rectified linear unit) \[ f(x) = max(0, x) \]
Data Prep
NNETAR() function fits an NNAR\((p,P,k)_m\) model.# A mable: 1 x 1
`NNETAR(sqrt(value))`
<model>
1 <NNAR(9,5)>
The neural network fitted to the sunspot data can be written as
\[ y_t = f({\bf y}_{t-1}) + \varepsilon_t \]
where \({\bf y}_{t-1} = (y_{t-1},y_{t-2},\dots,y_{t-9})'\)
Here is a simulation of 9 possible future sample paths for the sunspot data. Each sample path covers the next 30 years after the observed data.
forecast() function produces prediction intervals for NNAR models.times argument in forecast() controls how many simulations are done (default 1000).Standard Recurrent NN
LTSM
LTSM
Keras is a high-level API to build and train deep learning models. It’s used for fast prototyping, advanced research, and production, with three key advantages:
Note
Your might need activate your conda environment using conda activate rstudio
Note
You might need to install reticulate and tensorflow prior to keras
array([[[1., 2.],
[3., 4.]],
[[5., 6.],
[7., 8.]]])
model objectlibrary(keras)
model_rnn <- keras_model_sequential() %>%
layer_simple_rnn(units = 4, input_shape = c(num_timesteps, num_features)) %>%
layer_dense(units = 1)
model_rnn %>% compile(
optimizer = optimizer_rmsprop(),
loss = "mae"
) #model building
history_rnn <- model_rnn %>% fit(
x_train,
y_train,
batch_siz = 40,
epochs = 100,
validation_split = 0.2,
) #model training
plot(history_rnn)